Search Result

Select

Malicious code classification algorithm based on multi-feature fusion

LANG Dapeng, DING Wei, JIANG Haocheng, CHEN Zhiyuang

Journal of Computer Applications 2019, 39 (8): 2333-2338. DOI: 10.11772/j.issn.1001-9081.2019010116

Abstract （417）

PDF （902KB）（355）

Save

Concerning the fact that most malicious code classification researches are based on family classification and malicious and benign code classification, and the classification of categories is relatively few, a malicious code classification algorithm based on multi-feature fusion was proposed. Three sets of features extracted from texture maps and disassembly files were used for fusion classification research. Firstly, the gray level co-occurrence matrix features were extracted from source files and disassembly files and the sequences of operation codes were extracted by n-gram algorithm. Secondly, the improved Information Gain (IG) algorithm was used to extract the operation code features. Thirdly, Random Forest (RF) was used as the classifier to learn the multi-group features after normalization. Finally, the random forest classifier based on multi-feature fusion was realized. The proposed algorithm achieves 85% accuracy by learning and testing nine types of malicious codes. Compared with random forest under single feature, multi-layer perceptron under multi-feature and Logistic regression classifier, it has higher accuracy.

Reference | Related Articles | Metrics

Select

Base station traffic prediction model based on spatial collaboration

PENG Duo, ZHOU Jianguo, YI Shuwen, JIANG Hao

Journal of Computer Applications 2019, 39 (1): 154-159. DOI: 10.11772/j.issn.1001-9081.2018061330

Abstract （728）

PDF （962KB）（313）

Save

Concerning the problem that AutoRegressive Integrated Moving Average (ARIMA) model and Long Short-Term Memory (LSTM) unit do not utilize the collaboration between Base Stations (BSs) in traffic prediction, a new method called Traffic Prediction based on Space Collaboration (TPBC) which uses the collaboration between BSs produced by users was proposed. Firstly, a BS cooperative network was constructed based on the collaboration between BSs and then divided into multiple communities. Next, the cooperative BSs, which have the closest relationships with the target BS in the same community, were found via Granger causality test. Finally, a hybrid neural network was constructed by LSTM and Embedding layer, and the historial traffic of target BS and each cooperative BS was utilized for traffic prediction of target BS. The experimental results show that the Root Mean Square Error (RMSE) of TPBC is reduced by 29.19% and 27.47% compared with ARIMA and LSTM respectively. It shows that TPBC has the capability of improving the accuracy of BS traffic prediction effectively, which benefits traffic offloading and energy saving.

Reference | Related Articles | Metrics

Select

Representative-based ensemble learning classification with leave-one-out

WANG Xuan, ZHANG Lin, GAO Lei, JIANG Haokun

Journal of Computer Applications 2018, 38 (10): 2772-2777. DOI: 10.11772/j.issn.1001-9081.2018041101

Abstract （401）

PDF （862KB）（336）

Save

In order to response the effect of sampling non-uniformity, based on the representative-based classification algorithm, a Leave-One-Out Ensemble Learning Classification Algorithm (LOOELCA) for symbolic data classification was proposed. Firstly, n small training sets were obtained through leave-one-out methods, where n is the initial training set size. Then independent representative-based classifiers were built by using training sets, and the misclassified classifiers and objects were marked out. Finally, the marked classifier and the original classifier formed a committee and the test set objects were classified. If the committee voted the same, the test object was directly labeled with a class label; otherwise, the test object was classified based on the k-Nearest Neighbor (kNN) algorithm and the marked objects. The experimental results on the UCI standard dataset show that the accuracy of LOOELCA improved 0.35-2.76 percentage points on average compared with the Representative-Based Classification through Covering-Based Neighborhood Rough Set (RBC-CBNRS); compared with ID3, J48, Naïve Bayes, OneR and other methods, LOOELCA also has higher classification accuracy.

Reference | Related Articles | Metrics

Select

On-line forum hot topic mining method based on topic cluster evaluation

JIANG Hao CHEN Xingshu DU Min

Journal of Computer Applications 2013, 33 (11): 3071-3075.

Abstract （537）

PDF （795KB）（392）

Save

Hot topic mining is an important technical foundation for monitoring public opinion. As current hot topic mining methods cannot solve the affection of word noise and have single hot degree evaluation way, a new mining method based on topic cluster evaluation was proposed. After forum data was modeled by Latent Dirichlet Allocation (LDA) topic model and topic noise was cut off, the data were then clustered by improved cluster center selection algorithm K-means++. Finally, clusters were evaluated in three aspects: abruptness, purity and attention degree of topics. The experimental results show that both cluster quality and clustering speed can rise up by setting topic noise threshold to 0.75 and cluster number to 50. The effectiveness of ranking clusters by their probability of the existing hot topic with this method has also been proved on real data sets tests. At last a method was developed for displaying hot topics.